The prediction of vertebrate promoter regions using differential hexamer frequency analysis

نویسنده

  • G. B. Hutchinson
چکیده

MOTIVATION To develop an algorithm utilizing differential hexamer frequency analysis to discriminate promoter from non-promoter regions in vertebrate DNA sequence, without relying upon an extensive database of known transcriptional elements. RESULTS By determining hexamer frequencies derived from known promoter regions, coding regions and non-coding regions in vertebrates' DNA sequence, and a formula first applied by Claverie and Bougueleret (1986), a discriminant measure was created that compares promoter regions with coding (D1) and non-coding (D2) sequence. The algorithm is able to identify correctly the promoter regions in 18 of 29 loci (62.1%) from an independent test data set. With program options set to identify only one promoter region in the forward strand, there are 11 false-positive predictions in 208 714 nucleotides (one false positive in 18 974 single-stranded bp). With options set to analyze sequence in discrete segments, there is no appreciable improvement in sensitivity, whereas the specificity falls off predictably. It is of particular interest than a search for a peak score (independent of an absolute threshold) is more accurate that a search based upon a fixed scoring threshold. This suggests that the selection of promoter sites may be influenced by the global properties of an entire sequence domain, rather than exclusively upon local characteristics.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tidal prediction using time series analysis of Buoy observations

Although tidal observations which are extracted from coastal tide gages, have higher accuracy due to their higher sampling rate, installing these types of gages can impose some spatial limitation since we cannot use every part of sea to install them. To solve this limitation, we can employ satellite altimetry observations. However, satellite altimetry observations have lower sampling rate. Acco...

متن کامل

Mutations in the Basal Core Promoter and Precore/Core Regions of Hepatitis B Virus in Patients Co-Infected With Human Immunodeficiency Virus

ABSTRACT          Background and objectives: Globally, about one third of the population has been infected with Hepatitis B virus (HBV) and more than 400 million people have become chronically infected. Nearly, 20-25% of all carriers develop serious liver diseases such as cirrhosis, chronic hepatitis and hepatocellular carcinoma (HCC). According to t...

متن کامل

Allelic Variation of VRN-1 Locus in Iranian Wheat Landraces

Wheat is a crop with spring and winter types and wide adaptability to different climate conditions. The wide adaptability of wheat is mainly controlled by three groups of genetic factors and among them vernalization (VRN) genes play pivotal role in determining spring and winter types. In this study, 395 Iranian wheat landraces were characterized with specific primer pairs designed based on VRN-...

متن کامل

Mutational analysis of the histidine operon promoter of Salmonella typhimurium.

We isolated a collection of 67 independent, spontaneous Salmonella typhimurium his operon promoter mutants with decreased his expression. The mutants were isolated by selecting for resistance to the toxic lactose analog o-nitrophenyl-beta-D-thiogalactoside in a his-lac fusion strain. The collection included base pair substitutions. small insertions, a deletion, and one large insertion identifie...

متن کامل

Finding Coding Region Using Secondary Hexamer Measure and Two-Dimensional Linear Discriminant Analysis

We have developed a coding region prediction system. It is constructed from several measures that indicate exonness of a region in DNA sequence. The system includes a new statistical measure called secondary hexamer measure which we have developed. In addition to the measure, several measures are combined by two-dimensional linear discriminant analysis (2D-LDA). Then the system outputs a best g...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer applications in the biosciences : CABIOS

دوره 12 5  شماره 

صفحات  -

تاریخ انتشار 1996